Revisiting the D-iteration method: from theoretical to 

practical computation cost 
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ABSTRACT 



In this paper, we revisit the D-iteration algorithm in order 
to better explain its connection to the Gauss-Seidel method 
and different performance results that were observed. In 
particular, we study here the practical computation cost 
based on the execution runtime compared to the theoret- 
ical number of iterations. We also propose an exact formula 
of the error for PageRank class of equations. 

Categories and Subject Descriptors 

G.1.3 [Mathematics of Computing]: Numerical Anal- 
ysis — Numerical Linear Algebra; G.2.2 [Discrete Mathe- 
matics]: Graph Theory — Graph algorithms 

General Terms 

Algorithms, Performance 

Keywords 

Numerical computation; Iteration; Fixed point; Gauss-Seidel; 
Eigenvector. 

1. INTRODUCTION 

In this paper, we assume that the readers are already fa- 
miliar with the idea of the fluid diffusion associated to the 
D-iteration ^ to solve the equation: 

X = P.X + B 

and its application to PageRank equation [4]. 

For the general description of alternative or existing iter- 
ation methods, one may refer to [2l[Q. 

In Section [2j we explain the exact connection between 
the D-iteration and the Gauss-Seidel iteration. Section |3] 
presents the formula for the exact error (distance to the 
limit with L\ norm) for PageRank type equations. Section 
U presents the analysis of the computation cost. 

2. CONNECTION TO GAUSS-SEIDEL IT- 
ERATION 

We recall the equation on the Hn history vector associated 
to the D-iteration method: 

H„ = {Id - .h„{Id ~ P)) H„-i + J.^Fo (1) 

where Id is the identity matrix, Jk a matrix with all entries 
equal to zero except for the fc-th diagonal term: {Jk)kk = 



1, -Fo the initial condition vector (equal to B) and in the 
n-th choice of node for the diffusion. The choice of the 
sequence I = {ii,i2, ..., in, ...} with i„ € {1,..,A'^} is the 
main optimization factor of the D-iteration. 
The above equation Q is in fact equivalent to: 

li i ^ i„ : {IIn)i = {IIn-i)t 

If i = i„ : {Hn)^,, = L»„ {P)H„-i + (B),„ 

which is exactly the Gauss-Seidel iteration equation if we 
apply the diagonal term elimination (division by 1/(1— pa)). 
This means that the history vector II„ we obtain with the D- 
iteration is exactly the same results than the Gauss-Seidel's 
result when applying the same sequence /. In particular, it 
means that one can apply the Gauss-Seidel method for any 
infinite sequence of I and the limit is not modified. However, 
the main difference is that with the D-iteration, we don't use 
the equation ((TJ. Instead, we use the column vector Ci{P) to 
update {Fn, Hn): the advantage of introducing and working 
with Fn is that (cf. pseudo-code [3]): 

• we know exactly in advance the amount of fluid on 
which the diffusion is applied (so the consequence in 
advance: how much remains and how much disap- 
pears): if from Hn (and line vector application), we 
want to optimize the way the sequence I is built, it is 
not obvious (and this explains why up to now only the 
cyclic iteration is done) and it would in fact require 
the information of Fn; 

• the computation cost is reduced: this will be illus- 
trated below. The main reason is that when the diffu- 
sion is applied, each computation is useful (each diffu- 
sion adds fluid effectively to its children nodes) , whereas 
with the line vector application on Hn, we may have a 
lot of redundant computation. To understand this last 
idea, assume that x% of N are constant or almost con- 
stant. With the line vector Li{P) application, there 
will be x% of operations that will be repeated and 
that could be avoided if the diffusion approach is ap- 
plied (cf. results comparison of Section r4.7|) . 

3. IMPROVING THE ERROR ESTIMATE 

It has been shown in [J that r/{l — d) is an exact distance 
(for L\ norm) to the limit when P has all columns summing- 
up tp d. In case of the PageRank equation, we may have 
zero-column vector in P (if we don't do the P completion 
operation cf. [S]). Indeed if zero-column vector of P (corre- 
sponding to dangling nodes) is to be completed by d/N , any 



iteration scheme would do useless computations. When, we 
are working on the P matrix without completion, the limit 
we obtain need to be renormalized (by a constant multi- 
plication for diffusion approach or by constant addition for 
power iteration). 

To take into account this effect precisely, we count the 
total amount of fluid that left the system when a diffusion 
is applied on a dangling node: we call this quantity e„ (at 
step n of the D-iteration). This quantity should have been 
put in the system by adding e„ x d/N on each node, which 
means that the initial fluid should have been {l~d — den)/N 
instead of (1 — d)/N . But then the fluid den/N would have 
produced after n steps (d x en/(l — d))^ /N that disappears 
by dangling nodes, etc. Applying the argument recursively, 
the correction that is required on the residual fluid r„ (equal 
to \Fn\) is to replace the initial condition ro = 1 — d by: 



{l — d) + de„ + den 



den 



d 



+ de„ 



de„ 



d 



+ ■ 



1 — d — de„ 



And Hn need to be renormalized (multiplication) by (1 — 
d)/(l — d — de„) so that the exact Li distance \Hao — Hn\ is 
equal to: 

\H^ - (1 - d)/(l - d - den)Hn\ = r„/(l - d - de„). 

Below, in the D-iteration approach, we updated e„ by: 

if in is a dangling node. 

4. ANALYSIS OF THE COMPUTATION COST 
4.1 Web graph dataset 

For the evaluation purpose, we used the web graph im- 
ported from the dataset uk-2007-05@1000000 (available on 
[1]) which has 41,247,159 links on 1,000,000 nodes. 

Below we vary A*' from 10"^ to 10® extracting from the 
dataset the information on the first N nodes. Few graph 
properties are summarized in Table [l] 

• L: number of non-null entries (links) of P; 

• D: number of dangling nodes (0 out-degree nodes); 

• E: number of in-degree nodes: the in-degree nodes 
are defined recursively: a node i, having incoming links 
from nodes that are all in-degree nodes, is also a 
in-degree node; from the diffusion point of view, those 
nodes are those who converged exactly in finite steps; 

• O: number of loop nodes (pa ^ 0); 

• maxi„ — maxi #ini (maximum in-degree, the in-degree 
of i is the number of non-null entries of the i-th line 
vector of P) ; 

• maxoiit = maxi H^outi (maximum out-degree, the out- 
degree of i is the number of non-null entries of the i-th 
column vector of P). 



N 


L/N 


D/N 


E/N 


0/N 


maxi,i 


maxout 


10'^ 


12.9 


0.041 


0.032 


0.236 


716 


130 


10^ 


12.5 


0.008 


0.145 


0.114 


7982 


751 


10^ 


31.4 


0.027 


0.016 


0.175 


34764 


3782 


lO'^ 


41.2 


0.046 





0.204 


403441 


4655 



Table 1: Extracted graph: N = 10^ to lO" 




100000 200000 300000 400000 500000 600000 700000 800000 900000 1e+06 
matrix column 

Figure 1: P matrix associated to uk — 2007 — 
05@1000000: random sampling of 100000 links. 



4.2 Programming environment 

For the evaluation of the computation cost, we used C-\- + 
codes {g + -\ — 3.3) on a Linux (Ubuntu) machine: 

Intel(R) Core(TM)2 CPU, U7600, 1 . 20GHz, cache size 2048 
KB. 

The runtime has been measured based on the library time.h 
with the function clockQ (with a precision of 10 ms). The 
runtime below measures the computation time from the time 
we start iterations (time to build the iterators are excluded). 

4.3 First comparison tests 

The first algorithms that we evaluated in this section are: 

• PI: Power iteration (equivalent to Jacobi iteration); 

• GS: Gauss-Seidel iteration (cyclic sequence); 

• GS': Gauss-Seidel iteration (cyclic sequence): keeping 
diagonal terms; 

• DI-CYC: D-iteration with cyclic sequence (a node i is 
selected, if (F„)i > 0); 

• DI-MAX: D-iteration with i„ = argmaxi(_F„_i)i by 
threshold (by threshold means that we apply the dif- 
fusion to all nodes above the threshold value; when 
there is no such node, we decrease multiplicatively the 
threshold, by default by 1.2, which we call the decre- 
ment factor); 

• DI-OP: D-iteration with i„ — a.rgmaxi{F„-i)i/{Hn-i)i 
by threshold, diffusions are applied first on the in- 
degree nodes (recursively, such that we need to apply 
the diffusion exactly once to each of those nodes) . 



Because, we are currently limited by the memory size on 
a single PC, we introduced an arbitrary function each time 
a non zero entry of P need to be used by introducing a finite 
iteration of 

void ArbitraryFunction(int m){ 
double X = 1.0; 
for (int i=0; i < m; !++)■[ 
X := a * X + b; 

} 



(m is the number of times we iterates): this would corre- 
spond exactly to the reality if the operation pij x Xj is to 
be replaced by an operator fij{xj) whose computation cost 
is exactly the finite iteration we introduced arbitrarily. 

As one can observe in the results of Tables [5] and O the 
main improvements are brought by: 

• coordinate level update with the iteration GS and DI- 
CYC (against vector level update of PI) ; 

• a better choice of the nodes for the diffusion process: 
we see a significant jump with the very basic solution 
DI-MAX; 

• then with DI-OP (we observed similar improvements 
with all variants of the idea of the argmax of weighted 
fluid). 

The impact of the diagonal term elimination and the im- 
pact of the redundant computation of line- vector operations 
in GS is limited here. 

We see also that the computation cost estimate with the 
number of iterations is a very good approximation when 
the matrix product operations (the multiplications) are the 
dominant component of the computation run time cost (when 
m is introduced). 

When we set m = (real runtime), we see that the real 
speed-up gain may be in fact much more important than 
those estimates (for large TV). Also, we can notice a surpris- 
ing efficiency of DI-GYC: if the main speed-up gain is from 
DI-MAX for the number of iterations, the maim improve- 
ment is brought by DI-GYC for the runtime. 

Figure [2] and [3] shows the evolution of the distance to the 
limit w.r.t. the computation costs: as we said, we can ob- 
serve that the number of iterations is a very good estimate 
of the real cost when the multiplication operations with the 
entries of the matrix is the dominant component of the com- 
putation. 

The visible difference in the number of iterations and run- 
time for DI-GYC we noticed above can be explained by the 
fact that for the practical numerical computation point of 
view, it is not necessarily good to look for the optimal / se- 
quence: instead of spending time to select the best nodes for 
the diffusion, it suggests that it would be better to choose 
quickly suboptimal nodes for the diffusion: the simplest is 
DI-GYC, but we'll see that we can do better. 

4.4 Second comparison tests 

Based on the first observation, we next re-evaluated the 
computation cost, decreasing the cost of the node selection: 
or by increasing the threshold decrement factor (DI-OP2, 
DI-MAX2) or by taking all nodes above a certain average 
(and not choosing the threshold from the maximum) with 
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time (s) 


223 


198 


45 


49 


27 


speed-up 
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1.1 


5.0 


4.6 


8.3 



Table 2: Comparison of the runtime for a target 
error of 1/iV. m = 1000000, 100000, 10000, 1000, 0. GS is 
computed here without diagonal terms elimination, 
speed-up: gain factor w.r.t. PI. 
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time (s) 


223 


147 


31 


44 


36 


speed-up 


1.0 


1.5 


7.2 


5.1 


6.0 



Table 3: Comparison of the runtime for a target 
error of 1/iV. m = 1000000, 100000, 10000, 1000. Except 
for PI, we applied the diagonal terms elimination to 
all other approaches. 
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Figure 2: Comparison for A^ = 1000000: computation 
cost in number of iterations. 
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Figure 3: Comparison for A^ = 1000000: computation 
cost in time, m = 10^. 
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DI-0P2 


DI-0P3 


DI-MAX2 
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18.3 
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nb iter 


15.3 


20.4 


15.8 


15.8 


time 


36 


20 


18 


39 



Table 4: Comparison of the runtime (in seconds) for 
a target error of 1/A'^. m — 0. The diagonal terms 
ehmination appHed to all approaches. 



DI-0P3, witii the idea of wasting less time to find the opti- 
mal nodes for diffusion. Below, the different algorithms for 
tlie second tests: 

• DI-OP: D-iteration with i„ — argmax(_F'„_i)i/(H„_i)i 
by threshold (decrement factor 1.2), diffusions are ap- 
plied first on the O in-degree nodes (recursively, such 
that they need to send exactly once); 

• DI-OP2: as DI-OP, decrement factor 10.0; 

• DI-OP3: D-iteration with node selection if {Fn)i > 
r^i /N X 0.9; the value r„/ is the remaining fluid value 
computed by cycle n'; 

• DI-MAX2: as DI-MAX, with node selection if (F„)j > 
maxi(_F„)i/10; 

When simplifying the node selection method, we obtain 
results presented in Table [D We see that DI-OP3 is a 
good compromise between the number of iterations reduc- 
tion and runtime reduction. Note that the results of DI- 
MAX2 clearly shows a poor performance of the runtime 
w.r.t. what is expected from the number of iterations: the 
reason is that the number of nodes having fluid above the 
value maxi(_F„)i/10 is not important and we end up spend- 
ing a lot of time testing the node selection condition. And 
this explains also why DI-OP3 works better (less useless test 
operations) for the runtime, whereas for the number of iter- 
ations, DI-MAX2 and DI-OP3 are much closer. 

Table[5]gives the runtime comparison of different approach 
when we eliminate all operations indirectly linked to the it- 
erations, such as normalization, convergence test (only kept 
for DI-0P3) and printing results (time2). We see that its 
impact (time2 compared to timel) can be neglected for PI, 
GS and DI-CYC, much less for DI-MAX2 and DI-0P3. We 
see that for N = IQp , we can improve PI by factor 15 and 
GS by factor 10. This gain factor is much more than the 
ratio on the number of iterations. In order to explain this 
difference, we also introduced time3, which is the runtime 
obtained when at the compilation level of the source code 
{C++) the optimization option (—02) was not used. We see 
that the results are closer to the predicted values from the 
number of iterations. However, we still observe a difference 
of about a factor 2 on DI variants. 

4.5 Further analysing the speed-up factors 

We first shows in Figures U and [S] the evolution of the 
speed-up factor with A'' for different approaches. We observe 
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41.8 


39.8 


15.8 
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1.6 


1.7 
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4.2 


timel (s) 


223 


147 


31 


39 


18 


time2 (s) 


221 


147 


29 


31 


14.4 


speed-up 


1 


1.5 


7.6 


7.1 


15.3 


time3 (s) 


496 


323 


164 


169 


70 


speed-up 


1 


1.5 


3.0 


2.9 


7.1 



Table 5: Comparison of the runtime for a target er- 
ror of 1/N . m = 0. Except for PI, we applied the 
diagonal terms elimination to all other approaches, 
timel: with indirect operations; time2: without in- 
direct operations; timeS: no compilation optimiza- 
tion. 



a quite promising trend of speed-up factor for the runtime: 
Figure [5] shows that the gain factor due to the sequence 
choice is significantly increasing with the size A'. 



Figure 4: Speed-up factor on the nb of iterations. 

For the considered P matrix associated to the web graph, 
we can approximately decompose the speed-up gain factor 
for A^ = lO" as follows (for runtime): 

• Entry level update (GS): factor 1.5; 

• Use of (all) column vectors instead of (all) line vectors: 
factor 3 (compilation/processor optimization, cache mem- 
ory management); this factor is highly dependent on 
the structure of the graph and the compilation opti- 
mization (with Java code, we observed very different 
performance); this factor may be less than one (for in- 
stance if P* -transposition of P - is to be considered); 



Figure 5: Speed-up factor on the runtime. 



where column [i] is the iterator on the i-th column of P 

(C.iP)). 

By iterating the above schemes, we obtained a runtime of 
21 (line) and 6 (column) seconds (difference of factor 3.3). 
When the compilation option is not used, we observed quite 
close results. The reason of this difference may be in the 
property of the graph: Figure |5] shows the number of in- 
coming and outgoing links per node position. We clearly see 
that the variance of the number of outgoing links is much 
more smaller than the variance of the number of incoming 
links. We can expect such a property may be quite general 
when the graph is built from human contributions: the out- 
going links of a node are likely to be produced by one person 
or a small group of persons. Whereas when a web site or 
a content is very popular, it may receive a huge number 
of incoming links (following a popularity law such as Zipf 's 
law). 



• Impact of sequence choice: factor 4: 

— take columns with positive fluid (DI-CYC): factor 
2; 

— better selected choice of columns (DI-0P3): fac- 
tor 2. 

As we observed in the previous section, there is a sig- 
nificant gap between the computation cost in runtime or 
in number of iterations. We suspect that the main reason 
comes from a factor relative to the compilation level (as we 
saw, such as the optimization option), or the way the pro- 
cessor manages the cache memory access. To validate our 
assumption, we did the following tests: we evaluated the 
impact of the use of the column or line vectors of P in terms 
of the runtime. On the case N = 10^, we run the codes for 
column and line iterators: 




4000 5000 6000 
Node identifier 



8000 9000 10000 



Figure 6: Number of incoming and outgoing links. 



Code for line iterator: 
double result = 0.0; 
double count = 0; 
while ( count < 100 ){ 
count++ ; 
for (int i = 0; i < N; !++)■[ 

for (list<int>: : iterator j = line [i] .beginO ; 

j != line[i] .endO ; j++)-[ 
result += *j ; 
> 
} 



where line [i] is the iterator on the i-th line of P (Li(P)). 

Code for column iterator: 
double result = 0.0; 
double count = 0; 
while ( count < 100 ){ 
count++ ; 
for (int i = 0; i < N; i++)-[ 

for (list<int>: : iterator j = column [i] .beginO ; 

j != column [i] . endO ; j++)-[ 
result += *j ; 

y 
y 
y 



This means that for the operations with the line vector 
(collection), we end up with lists of very variable sizes. The 
operations with the column vector (diffusion) require lists of 
more regular sizes with less variance. 

For a better clarification of the runtime speed-up results, 
we consider now the different algorithms for the matrix P* 
(transposition of P). If the above explanation is consis- 
tent, we should have line-iterator operations much faster 
than column-iterator operations. The results are shown in 
Table |6] they are roughly as expected. If the gain factor 
from the line or column iteration is about 3.3, taking the 
P* we should have an impact of about 3.3^^ — 10 and this 
is what we observe on the ratio 176/18 (timel for DI-0P3). 
This is of course a very approximative explanation since the 
behaviours of the compiler and of the processor are very 
complex. 

4.6 A sub-optimal scheme 

Based on the above results, we understood that in the real 
computation cost of the iteration scheme, the optimization 
of the iterator plays an important role. In particular, when 
we have a column of P with a too large number of non-zero 
entries, its diffusion should be carefully controlled (limit as 
much as possible). This helped us to define the following 
scheme: 

• DI-SOP (sub-optimal compromise solution): D-iteration 
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GS 


DI-CYC 


DI-0P3 


N ^ l(f 
nb iter 
speed-up 


66/78 
1/1 


42/46 
1.6/1.5 


40/43 
1.7/1.6 


16/42 
4.1/1.6 


timel (s) 
speed-up 


223/99 
1/1 


147/59 
1.5/1.7 


31/165 
7.2/0.6 


18/176 
16.7/0.6 


times (s) 
speed-up 


496/388 
1/1 


323/244 
1.5/1.6 


164/291 
3.0/1.3 


70/284 
7.1/1.4 



Table 6: Comparison of the runtime for a target er- 
ror of 1/N for P and P* . m — 0. Except for PI, we 
applied the diagonal terms elimination to all other 
approaches, timel: with option — 02; timeS: no 
compilation optimization. 



p/pt 


PI 


GS 


DI-CYC 


DI-SOP 


N = W" 
nb iter 
speed-up 


52/51 
1/1 


37/38 
1.4/1.3 


35/35 
1.5/0.5 


14/17 
3.7/3.0 


timel (s) 
speed-up 


9.0/5.0 
1/1 


6.9/3.8 
1.3/1.3 


2.2/7.2 
4.1/0.7 


1.0/3.6 
9.0/1.4 


time3 (s) 
speed-up 


26/20 
1/1 


18/16 
1.4/1.2 


11/15 
2.3/1.3 


5/8 
5.6/2.5 


iV = 10'' 
nb iter 
speed-up 


66/78 
1/1 


42/46 
1.6/1.7 


40/43 
1.7/1.8 


14/17 
4.7/4.6 


timel (s) 
speed-up 


223/99 
1/1 


147/59 
1.5/1.7 


31/165 
7.2/0.6 


13.2/50 
16.9/2.0 


time3 (s) 
speed-up 


496/388 
1/1 


323/244 
1.5/1.6 


164/291 
3.0/1.3 


59.4/107 
8.4/3.6 



Table 7: Comparison of the runtime for a target er- 
ror of 1/N for P and P*. m — 0. Except for PI, we 
applied the diagonal terms elimination to all other 
approaches, timel: with option 02; timeS: no 
compilation optimization. 



with node selection, if {Fn)i > r, 
ij^outi is the out-degree of i and 
cycle n' . 



X #outi/L, where 
„/ is computed per 



Table[7]shows that DI-SOP performs pretty robustly even 
in worst conditions (P*). The intuition of DI-SOP is very 
clear: we choose all nodes such that the unitary diffusion 
cost {Fn)i/i^outi is above the average diffusion cost r„i /L. 
Indeed, r^'/L can be decomposed as r^i/N (average fluid 
per node) divided by L/N (average out-degree). 

Table |8] summarizes the results of the comparison for dif- 
ferent A*', introducing only m for differentiation purpose. 
Table[n]shows the results obtained when we set a large value 
of damping factor d = 0.99 (this makes the global conver- 
gence speed slower): the difference of performance is better 
illustrated: it seems that there is more gain when more it- 
erations are required (we could guess it from Figure [3] only 
DI- variants are linear). With A'' — 10*', we gained here a 
factor 36 in runtime. 

We globally observe that when a sufficiently large m is 
used, the relative computation time to PI is close to the 
prediction (number of iterations ratio) at least for GS and 
DI-CYC. For m = 0, it seems that the computation time 





PI 


GS 


DI-CYC 


DI-SOP 


N = W 










nb iter 


54 


31.4 


29.1 


18.2 


speed-up 


1.0 


1.7 


1.9 


3.0 


m= W" 










time (s) 


0.63 


0.38 


0.34 


0.20 


speed-up 


1.0 


1.7 


1.9 


3.1 


iV = 10* 










nb iter 


67 


44.6 


39.0 


17.8 


speed-up 


1.0 


1.5 


1.7 


3.8 


m= lO"" 










time (s) 


1.44 


0.99 


0.62 


0.30 


speed-up 


1.0 


1.5 


2.3 


4.8 


iV = 10" 










nb iter 


77 


50.7 


48.6 


20.0 


speed-up 


1.0 


1.5 


1.6 


3.9 


m= 10 










time (s) 


14.6 


11.1 


4.3 


2.5 


speed-up 


1.0 


1.3 


3.4 


5.8 


N = 10" 










nb iter 


92 


55.7 


53.7 


19.5 


speed-up 


1.0 


1.7 


1.7 


4.8 


m, = 1 










time (s) 


309 


194 


41.5 


20.0 


speed-up 


1.0 


1.6 


7.4 


15 



Table 8: Comparison of the runtime for a target 
error otO.Ol/N. m = 1000, 100, 10, 1. Except for PI, we 
applied the diagonal terms elimination to all other 
approaches. 



L/N 


D/N 


E/N 


O/N 


maxin 


maxout 


1.67 


0.48 


0.91 





199 


164 



Table 10: iV = 9664. 





PI 


GS 


DI-CYC 


DI-SOP 


N = W 










nb iter 


399 


303 


268 


111 


speed-up 


1.0 


1.3 


1.5 


3.6 


m = 










time (s) 


0.17 


0.15 


0.09 


0.03 


speed-up 


1.0 


1.1 


1.9 


5.7 


iV = 10'' 










nb iter 


544 


480 


404 


71.3 


speed-up 


1.0 


1.1 


1.3 


7.6 


m = 










time (s) 


3.46 


3.45 


1.36 


0.29 


speed-up 


1.0 


1.0 


2.5 


12 


iV = 10=' 










nb iter 


790 


579 


543 


117 


speed-up 


1.0 


1.4 


1.5 


6.8 


m = 










time (s) 


137 


107 


34 


9.2 


speed-up 


1.0 


1.3 


4.0 


15 


Af = 10'= 










nb iter 


1028 


648 


614 


98 


speed-up 


1.0 


1.6 


1.7 


10 


m = 










time (s) 


3455 


2257 


480 


95 


speed-up 


1.0 


1.5 


7.2 


36 



Table 9: Comparison of the runtime for a target 
error of 1/N with d — 0.99, tti = 0. Except for PI, we 
applied the diagonal terms elimination to all other 
approaches. 



gain relative to PI for Dl-variants may be higher then the 
prediction when A'^ is effectively large (for A*' = 10"" and A'^ = 
10* with [T]), possibly due to the cache memory access time 
optimization by the processor (likely to have fewer elements 
in the cache with D-variants) : the impact of the compilation 
or the processor level optimization is clearly very important, 
but this is another complex research issue which is out of 
scope of this paper. We hope to address this problem in a 
future work. 

4.7 Revisiting another dataset 

Below, we used the web graph grO. California (available 
on http: //www. cs . cornell.edu/Courses/cs685/ 2002f a/). 
The main motivation was here to try to understand the un- 
expected (too much) gain observed in [J for this graph. 
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Figure 7: P matrix associated to grO.California. 

As Table flOl shows . this graph is very specific in that more 
than 90% of nodes are in-degree nodes. This is quite inter- 
esting, because it illustrates clearly here the difference be- 
tween collection (line vector use) or diffusion (column vector 
use) approaches: because the in-degree nodes converges in 
finite iterations, GS is recomputing 90% of redundant oper- 
ations, whereas with the diffusion approach, the in-degree 
nodes are very easily identified as nodes having fluid and 
DI-CYC will only apply diffusion on 10% of nodes, explain- 
ing the gain factor of almost 10 between PI/GS and DI-CYC 

(Table [ni- 

Table [12] presents the results of the computation cost as- 
sociated to the matrix P* for comparison. 



5. CONCLUSION 

In this paper we revisited the D-iteration method with 
a practical consideration of the computation cost: step- by- 
step, we tried to understand and analyse the different com- 
ponents in the runtime cost. This led us to a more practical 
solution DI-SOP which seems to be a very good heuristic 
candidate for the choice of the sequence for the diffusion. 





PI 


GS 


DI-CYC 


DI-0P3 


DI-SOP 


nb iter 
speed-up 


43 
1 


22 
2 


3.1 

14 


1.6 

27 


1.6 

27 


timel 
speed-up 


582 
1 


298 
2.0 


42 
14 


9.2 
63 


12.2 

48 


time2 
speed-up 


0.06 
1 


0.04 
1.5 


0.03 
2.0 


0.02 
3.0 


0.01 
6.0 


times 
speed-up 


0.16 
1 


0.13 
1.2 


0.04 
4.0 


0.02 
8.0 


0.03 
5.3 



Table 11: iV = 9664, timesl: m = 10", timel': 
m = 10 , time2: m = with — 02, timeS: m — 
no optimization option. 
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PI 


GS 


DI-GYG 


DI-0P3 


DI-SOP 


nb iter 
speed-up 


28 
1 


16 

1.8 


5.6 
5.0 


2.0 
14 


1.8 
16 


timel 
speed-up 


379 
1 


217 
1.7 


75.2 
5.0 


17.7 
21 


9.3 

40 


time2 
speed-up 


0.04 
1 


0.04 
1.0 


0.02 
2.0 


0.02 
2.0 


0.01 
4.0 


time3 
speed-up 


0.11 
1 


0.10 
1.1 


0.05 
2.2 


0.04 

2.8 


0.03 
3.7 



Table 12: For P\ 
m = with -02, 
tion. 



N = 9664, timesl: m = 10", time2: 
time3: m — no optimization op- 



pi + 

DI-CYC --^-- 

GS D 

DI-MAX a 
DI-OP 




Figure 8: Comparison for grO.California: computa- 
tion cost in time, m — 10 . 



